Investigation of Random Subspace and Random Forest Methods Applied to Property Valuation Data

نویسندگان

  • Tadeusz Lasota
  • Tomasz Luczak
  • Bogdan Trawinski
چکیده

The experiments aimed to compare the performance of random subspace and random forest models with bagging ensembles and single models in respect of its predictive accuracy were conducted using two popular algorithms M5 tree and multilayer perceptron. All tests were carried out in the WEKA data mining system within the framework of 10-fold cross-validation and repeated holdout splits. A comprehensive real-world cadastral dataset including over 5200 samples and recorded during 11 years served as basis for benchmarking the methods. The overall results of our investigation were as follows. The random forest turned out to be superior to other tested methods, the bagging approach outperformed the random subspace method, single models provided worse prediction accuracy than any other ensemble technique.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigation of Property Valuation Models Based on Decision Tree Ensembles Built over Noised Data

The ensemble machine learning methods incorporating bagging, random subspace, random forest, and rotation forest employing decision trees, i.e. Pruned Model Trees, as base learning algorithms were developed in WEKA environment. The methods were applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. T...

متن کامل

Investigation of Random Subspace and Random Forest Regression Models Using Data with Injected Noise

The ensemble machine learning methods incorporating random subspace and random forest employing genetic fuzzy rule-based systems as base learning algorithms were developed in Matlab environment. The methods were applied to the real-world regression problem of predicting the prices of residential premises based on historical data of sales/purchase transactions. The accuracy of ensembles generate...

متن کامل

Comparison of Random Forest and Logistic Regression Methods in Predicting Mortality in Colorectal Cancer Patients and its Related Factors

Background and Objectives: The purpose of this study was to predict the mortality rate of colorectal cancer in Iranian patients and determine the effective factors  on the mortality of patients with colorectal cancer using random forest and logistic regression methods.   Methods: Data from 304 patients with colorectal cancer registry from the Gastroenterology and Liver Research Center of Shah...

متن کامل

Forest Stand Types Classification Using Tree-Based Algorithms and SPOT-HRG Data

Forest types mapping, is one of the most necessary elements in the forest management and silviculture treatments. Traditional methods such as field surveys are almost time-consuming and cost-intensive. Improvements in remote sensing data sources and classification –estimation methods are preparing new opportunities for obtaining more accurate forest biophysical attributes maps. This research co...

متن کامل

Valuing Benefits of Finnish Forest Biodiversity Conservation – Logit Models for Pooled Contingent Valuation and Contingent Rating/Ranking Survey Data

This paper examines contingent valuation and contingent rating/ranking valuation methods (CV and CR methods) for measuring willingness-to-pay (WTP) for nonmarket goods. Recent developments in discrete choice econometrics using random parameter models are applied to CV and CR data and their performance evaluated in comparison to conventionally used fixed parameter models. A framework for using d...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011